{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2. 参数管理\n",
    "一旦我们选择了架构并设置了超参数，我们就进入了训练阶段。此时，我们的目标是找到使损失函数最小化的参数值。经过训练后，我们将需要使用这些参数来做出未来的预测。此外，有时我们希望提取参数，以便在其他环境中复用它们，将模型保存到磁盘，以便它可以在其他软件中执行，或者为了获得科学的理解而进行检查。\n",
    "\n",
    "大多数情况下，我们可以忽略声明和操作参数的具体细节，而只依靠深度学习框架来完成繁重的工作。然而，当我们离开具有标准层的层叠架构时，我们有时会陷入声明和操作参数的麻烦中。在本节中，我们将介绍以下内容：\n",
    "\n",
    "* 访问参数，用于调试、诊断和可视化。\n",
    "* 参数初始化。\n",
    "* 在不同模型组件间共享参数\n",
    "我们首先关注具有单隐藏层的多层感知机。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Tensor(shape=[2, 1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "       [[-0.34809890],\n",
       "        [-0.03128956]])"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import paddle\n",
    "from paddle import nn\n",
    "\n",
    "net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))\n",
    "X = paddle.rand([2, 4])\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.1. 参数访问\n",
    "我们从已有模型中访问参数。当通过Sequential类定义模型时，我们可以通过索引来访问模型的任意层。这就像模型是一个列表一样。每层的参数都在其属性中。如下所示，我们可以检查第二个全连接层的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OrderedDict([('weight', Parameter containing:\n",
      "Tensor(shape=[8, 1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [[ 0.02201402],\n",
      "        [ 0.77933061],\n",
      "        [-0.37442669],\n",
      "        [ 0.07183987],\n",
      "        [-0.15619642],\n",
      "        [-0.67954248],\n",
      "        [ 0.27284920],\n",
      "        [-0.73753470]])), ('bias', Parameter containing:\n",
      "Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [0.]))])\n"
     ]
    }
   ],
   "source": [
    "print(net[2].state_dict())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.1.1. 目标参数\n",
    "注意，每个参数都表示为参数（parameter）类的一个实例。要对参数执行任何操作，首先我们需要访问底层的数值。有几种方法可以做到这一点。有些比较简单，而另一些则比较通用。下面的代码从第二个神经网络层提取偏置，提取后返回的是一个参数类实例，并进一步访问该参数的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'paddle.fluid.framework.ParamBase'>\n",
      "Parameter containing:\n",
      "Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [0.])\n",
      "<bound method PyCapsule.value of Parameter containing:\n",
      "Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [0.])>\n"
     ]
    }
   ],
   "source": [
    "print(type(net[2].bias))\n",
    "print(net[2].bias)\n",
    "print(net[2].bias.value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参数是复合的对象，包含值、梯度和额外信息。这就是为什么我们需要显式请求值的原因。\n",
    "\n",
    "除了值之外，我们还可以访问每个参数的梯度。由于我们还没有调用这个网络的反向传播，所以参数的梯度处于初始状态。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[2].weight.grad == None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.1.2. 一次性访问所有参数\n",
    "当我们需要对所有参数执行操作时，逐个访问它们可能会很麻烦。当我们处理更复杂的块（例如，嵌套块）时，情况可能会变得特别复杂，因为我们需要递归整个树来提取每个子块的参数。下面，我们将通过演示来比较访问第一个全连接层的参数和访问所有层。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('weight', [4, 8]) ('bias', [8])\n",
      "('0.weight', [4, 8]) ('0.bias', [8]) ('2.weight', [8, 1]) ('2.bias', [1])\n"
     ]
    }
   ],
   "source": [
    "print(*[(name, param.shape) for name, param in net[0].named_parameters()])\n",
    "print(*[(name, param.shape) for name, param in net.named_parameters()])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这为我们提供了另一种访问网络参数的方式，如下所示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Parameter containing:\n",
       "Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "       [0.])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.state_dict()['2.bias']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.1.3. 从嵌套块收集参数\n",
    "让我们看看，如果我们将多个块相互嵌套，参数命名约定是如何工作的。为此，我们首先定义一个生成块的函数（可以说是块工厂），然后将这些块组合到更大的块中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Tensor(shape=[2, 1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "       [[0.00852812],\n",
       "        [0.        ]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def block1():\n",
    "    return nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 4),\n",
    "                         nn.ReLU())\n",
    "\n",
    "def block2():\n",
    "    net = nn.Sequential()\n",
    "    for i in range(4):\n",
    "        # 在这里嵌套\n",
    "        net.add_sublayer(f'block {i}', block1())\n",
    "    return net\n",
    "\n",
    "rgnet = nn.Sequential(block2(), nn.Linear(4, 1))\n",
    "rgnet(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们已经设计了网络，让我们看看它是如何组织的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sequential(\n",
      "  (0): Sequential(\n",
      "    (block 0): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, dtype=float32)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, dtype=float32)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "    (block 1): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, dtype=float32)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, dtype=float32)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "    (block 2): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, dtype=float32)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, dtype=float32)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "    (block 3): Sequential(\n",
      "      (0): Linear(in_features=4, out_features=8, dtype=float32)\n",
      "      (1): ReLU()\n",
      "      (2): Linear(in_features=8, out_features=4, dtype=float32)\n",
      "      (3): ReLU()\n",
      "    )\n",
      "  )\n",
      "  (1): Linear(in_features=4, out_features=1, dtype=float32)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(rgnet)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为层是分层嵌套的，所以我们也可以像通过嵌套列表索引一样访问它们。例如，我们下面访问第一个主要的块，其中第二个子块的第一层的偏置项。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameter containing:\n",
      "Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [0., 0., 0., 0., 0., 0., 0., 0.])\n"
     ]
    }
   ],
   "source": [
    "print(rgnet[0].state_dict()['block 0.0.bias'])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.2. 参数初始化\n",
    "我们知道了如何访问参数，现在让我们看看如何正确地初始化参数。我们在 4.8节 中讨论了良好初始化的必要性。深度学习框架提供默认随机初始化。然而，我们经常希望根据其他规则初始化权重。深度学习框架提供了最常用的规则，也允许创建自定义初始化方法。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "默认情况下，Paddle会根据一个范围均匀地初始化权重和偏置矩阵，这个范围是根据输入和输出维度计算出的。Paddle的nn.init模块提供了多种预置初始化方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.2.1. 内置初始化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们首先调用内置的初始化器。下面的代码将所有权重参数初始化为标准差为0.01的高斯随机变量，且将偏置参数设置为0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "        [ 0.28167605, -0.32783630,  0.50688285,  0.42611700,  0.12231040, -0.34835145, -0.61263883,  0.21643394]),\n",
       " Parameter containing:\n",
       " Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0.]))"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def init_normal(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        paddle.nn.initializer.Normal(mean=0.0, std=0.01)\n",
    "        paddle.zeros(m.bias)    \n",
    "     \n",
    "\n",
    "net.apply(init_normal)\n",
    "net[0].weight[0],net[0].state_dict()['bias']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们还可以将所有参数初始化为给定的常数（比如1）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "        [ 0.28167605, -0.32783630,  0.50688285,  0.42611700,  0.12231040, -0.34835145, -0.61263883,  0.21643394]),\n",
       " Parameter containing:\n",
       " Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
       "        [0., 0., 0., 0., 0., 0., 0., 0.]))"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def init_constant(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        paddle.nn.initializer.Constant(value=1)\n",
    "        paddle.zeros(m.bias)\n",
    "     \n",
    "        #nn.init.normal_(m.weight, mean=0, std=0.01)\n",
    "        #nn.init.zeros_(m.bias)\n",
    "\n",
    "net.apply(init_normal)\n",
    "net[0].weight[0],net[0].state_dict()['bias']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们还可以对某些块应用不同的初始化方法。例如，下面我们使用Xavier初始化方法初始化第一层，然后第二层初始化为常量值42。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [ 0.28167605, -0.32783630,  0.50688285,  0.42611700,  0.12231040, -0.34835145, -0.61263883,  0.21643394])\n",
      "Parameter containing:\n",
      "Tensor(shape=[8, 1], dtype=float32, place=CPUPlace, stop_gradient=False,\n",
      "       [[ 0.02201402],\n",
      "        [ 0.77933061],\n",
      "        [-0.37442669],\n",
      "        [ 0.07183987],\n",
      "        [-0.15619642],\n",
      "        [-0.67954248],\n",
      "        [ 0.27284920],\n",
      "        [-0.73753470]])\n"
     ]
    }
   ],
   "source": [
    "def xavier(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        paddle.nn.initializer.XavierUniform(m.weight)\n",
    "       \n",
    "\n",
    "def init_42(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        paddle.nn.initializer.Constant(42)\n",
    "        \n",
    "\n",
    "net[0].apply(xavier)\n",
    "net[2].apply(init_42)\n",
    "print(net[0].weight[0])\n",
    "print(net[2].weight)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.2.2. 自定义初始化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有时，深度学习框架没有提供我们需要的初始化方法。在下面的例子中，我们使用以下的分布为任意权重参数 w 定义初始化方法：\n",
    "\n",
    "![](https://ai-studio-static-online.cdn.bcebos.com/400bc2321f1f4011bb117b26f543d793f9ac015511d64eda8717ceb09c2df4db)\n",
    "\n",
    "同样，我们实现了一个my_init函数来应用到net。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Init weight [4, 8]\n",
      "Init weight [8, 1]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/wugaosheng/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:239: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.bool, the right dtype will convert to paddle.float32\n",
      "  format(lhs_dtype, rhs_dtype, lhs_dtype))\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Tensor(shape=[2, 8], dtype=float32, place=CPUPlace, stop_gradient=True,\n",
       "       [[ 0.28167605, -0.32783630,  0.50688285,  0.42611700,  0.12231040, -0.34835145, -0.61263883,  0.21643394],\n",
       "        [ 0.13434005, -0.22188565,  0.09254831, -0.32003513, -0.01320523,  0.17332590, -0.24981803,  0.68207151]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def my_init(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        print(\n",
    "            \"Init\",\n",
    "            *[(name, param.shape) for name, param in m.named_parameters()][0])\n",
    "        paddle.nn.initializer.XavierUniform(m.weight, -10, 10)\n",
    "        h=paddle.abs(m.weight)>=5\n",
    "        h=paddle.to_tensor(h)\n",
    "        m=paddle.to_tensor(m.weight)\n",
    "        m*=h\n",
    "        \n",
    "\n",
    "net.apply(my_init)\n",
    "net[0].weight[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意，我们始终可以直接设置参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Tensor(shape=[8], dtype=float32, place=CPUPlace, stop_gradient=True,\n",
       "       [42.       , 0.67216372, 1.50688291, 1.42611694, 1.12231040, 0.65164852, 0.38736117, 1.21643400])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net[0].weight[:] += 1\n",
    "net[0].weight[0, 0] = 42\n",
    "net[0].weight[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.3. 参数绑定\n",
    "有时我们希望在多个层间共享参数。让我们看看如何优雅地做这件事。在下面，我们定义一个稠密层，然后使用它的参数来设置另一个层的参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensor(shape=[8], dtype=bool, place=CPUPlace, stop_gradient=False,\n",
      "       [True, True, True, True, True, True, True, True])\n",
      "Tensor(shape=[8], dtype=bool, place=CPUPlace, stop_gradient=False,\n",
      "       [True, True, True, True, True, True, True, True])\n"
     ]
    }
   ],
   "source": [
    "# 我们需要给共享层一个名称，以便可以引用它的参数。\n",
    "shared = nn.Linear(8, 8)\n",
    "net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),\n",
    "                    shared, nn.ReLU(),\n",
    "                    shared, nn.ReLU(),\n",
    "                    nn.Linear(8, 1))\n",
    "net(X)\n",
    "# 检查参数是否相同\n",
    "print(net[2].weight[0] == net[4].weight[0])\n",
    "net[2].weight[0, 0] = 100\n",
    "# 确保它们实际上是同一个对象，而不只是有相同的值。\n",
    "print(net[2].weight[0] == net[4].weight[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这个例子表明第二层和第三层的参数是绑定的。它们不仅值相等，而且由相同的张量表示。因此，如果我们改变其中一个参数，另一个参数也会改变。你可能会想，当参数绑定时，梯度会发生什么情况？答案是由于模型参数包含梯度，因此在反向传播期间第二个隐藏层和第三个隐藏层的梯度会加在一起。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.4. 小结\n",
    "* 我们有几种方法可以访问、初始化和绑定模型参数。\n",
    "* 我们可以使用自定义初始化方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.2.5. 练习\n",
    "* 使用 5.1节 中定义的FancyMLP模型，访问各个层的参数。\n",
    "* 查看初始化模块文档以了解不同的初始化方法。\n",
    "* 构建包含共享参数层的多层感知机并对其进行训练。在训练过程中，观察模型各层的参数和梯度。\n",
    "* 为什么共享参数是个好主意？"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
